Skip to content

prov/efa: Introduce PPS enhancement interface#12225

Open
shijin-aws wants to merge 6 commits into
ofiwg:mainfrom
shijin-aws:pps
Open

prov/efa: Introduce PPS enhancement interface#12225
shijin-aws wants to merge 6 commits into
ofiwg:mainfrom
shijin-aws:pps

Conversation

@shijin-aws
Copy link
Copy Markdown
Contributor

A series of commit that introduce an operation level interface to allow user enhance the packet per second (PPS) .

@shijin-aws shijin-aws requested a review from a team May 12, 2026 01:53
@shijin-aws
Copy link
Copy Markdown
Contributor Author

@mrgolin can u take a look

}


if (efa_env.enable_high_pps && (flags & FI_EFA_WR_HIGH_PPS))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the efa_env.enable_high_pps will be dropped eventually ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a temporary to gate this feature before the firmware deployment is fully completed

Comment thread prov/efa/src/efa_data_path_direct_internal.h Outdated
Comment thread prov/efa/test/efa_unit_test_data_path_direct.c
Comment thread prov/efa/src/efa_io_defs.h
Comment thread prov/efa/src/efa_base_ep.c
Comment thread prov/efa/test/efa_unit_test_rma.c Outdated
@shijin-aws shijin-aws force-pushed the pps branch 3 times, most recently from 0fda097 to 94ff076 Compare May 16, 2026 01:43

@pytest.mark.fabric(params=["efa", "efa-direct"])
@pytest.mark.message_sizes(default_efa=PERF_SIZES, default_efa_direct=DIRECT_RMA_SIZES,
pr_ci_efa=PERF_PR_CI, pr_ci_efa_direct=DIRECT_RMA_SIZES)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need pr_ci_efa=PERF_PR_CI, pr_ci_efa_direct=DIRECT_RMA_SIZES if not @pytest.mark.pr_ci

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see all the tests above have such decorator, even if most of them don't have pytest.mark.pr_ci. I do want it to get run within PR CI, what is the suggested decorators exactly? @charlesstoll

Comment thread prov/efa/src/fi_ext_efa.h Outdated


/*
* EFA provider-specific operation flags (bits 60-63).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a sentence directing to flags 0-59 could be useful

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}


if (efa_env.enable_processing_hints)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would FI_EFA_WR_HIGH_PPS ever be set if enable_processing_hints is not?
maybe just check for the first here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FI_EFA_WR_HIGH_PPS should always be exposed to applications. So we need the env check as a runtime gate. And I would add a comment that such env is only a temporary gate before the fw deployment and we should remove it after the deployment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the environment variable as a gate as a separate commit f8124ce, and it should be reverted finally after the deployment.

{
if (flags & FI_EFA_WR_HIGH_PPS) {
uint8_t wqe_hints = 0;
wqe_hints |= EFA_IO_PROCESSING_HINT_BURST_PPS_SENSITIVE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like you're preparing this function for more potential hints in the future, which I'm not sure we need to at this point. But if you do, then wqe_hints declaration and EFA_SET should be outside the HIGH_PPS if blocjk

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My old implementation was just calling an EFA_SET(...., 1) when (flags & FI_EFA_WR_HIGH_PPS). Then I saw rdma-core's implementation is assuming that can be extended so I followed.
I also thought about making that wqe_hint and EFA_SET outside the if block but I think that can cause them always called for all flags which is not necessary

Copy link
Copy Markdown
Contributor Author

@shijin-aws shijin-aws May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to only call the whole processing hint function when (flags & FI_EFA_WR_HIGH_PPS)

Comment thread prov/efa/src/efa_data_path_ops.h Outdated
ibv_wr_set_ud_addr(qp->ibv_qp_ex, ah->ibv_ah, qpn, qkey);

#if HAVE_EFADV_WR_PROCESSING_HINTS
if (efa_env.enable_processing_hints && (flags & FI_EFA_WR_HIGH_PPS))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can enable_processing_hints be set if HAVE_EFADV_WR_PROCESSING_HINTS is not defined? maybe the macro check is redundant?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HAVE_EFADV_WR_PROCESSING_HINTS only protects rdma-core interface, but we don't need a new rdma-core interface to enable processing hints, for data path direct code path. So these two are orthogonal

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above, I made the environment variable as a gate as a separate commit f8124ce, and it should be removed finally after the deployment.

talavr-amazon
talavr-amazon previously approved these changes May 18, 2026
* such as the FI_EFA_WR_HIGH_PPS flag. It currently supports write,
* writedata, and read operations.
*
* Unlike fi_rma_bw, this test uses a nonblocking benchmark loop that
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this test fi_rma_bw_non_blocking?

@j-xiong do you have an opinion? Will this test be useful for other providers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is useful in general.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was implementing some efa specific flags, I can do some refactor to make it ported to common code for the non-blocking pattern. But it will require more work

Define FI_EFA_WR_HIGH_PPS as a provider-specific operation
flag (bit 60) in fi_ext_efa.h. Applications pass this flag
in fi_writemsg() to hint the device to optimize for higher
message rate on RDMA write operations.

Signed-off-by: Shi Jin <sjina@amazon.com>
In efa_data_path_direct_post_write(), set the processing hint
bit in the TX WQE metadata ctrl3 field when both
efa_env.enable_high_pps and FI_EFA_WR_HIGH_PPS are set. Add
efa_send_wr_set_processing_hint_high_pps() helper and the
EFA_IO_TX_META_DESC_PROCESSING_HINT_MASK definition for the
new ctrl3 field in efa_io_defs.h.

Signed-off-by: Shi Jin <sjina@amazon.com>
Implement the rdma-core data path integration for the high PPS
processing hint on RDMA write operations. During QP creation, set
EFADV_WR_EX_WITH_PROCESSING_HINT in wr_flags to enable the WQE-level
hint setter. In efa_ibv_post_write(), call efadv_wr_set_processing_hint()
when both the feature flag and FI_EFA_WR_HIGH_PPS are set.

Add configure checks for efadv_qp_from_ibv_qp_ex, wr_flags, and
EFADV_WR_PROCESSING_HINT_BURST_PPS_SENSITIVE. All changes are guarded
by HAVE_EFADV_WR_PROCESSING_HINT for backward compatibility with older
rdma-core versions.

Signed-off-by: Shi Jin <sjina@amazon.com>
Add fi_efa_rma_bw, an EFA-specific RMA bandwidth test that
supports write and writedata operations with EFA-specific
features such as the FI_EFA_WR_HIGH_PPS flag.

Unlike fi_rma_bw, this test uses a nonblocking benchmark loop
that interleaves posting and completion polling to keep the
pipeline full, similar to the approach used by
rdma-core/perftest. This avoids blocking at window boundaries
and maximizes throughput.

Signed-off-by: Shi Jin <sjina@amazon.com>
Disable shm because it is a efa-only test.

Signed-off-by: Shi Jin <sjina@amazon.com>
Add enable_high_pps field to efa_env struct, gated by the
undocumented FI_EFA_ENABLE_HIGH_PPS environment variable.
This allows controlled rollout of PPS optimization before
firmware deployment is complete.

Signed-off-by: Shi Jin <sjina@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants